Developing Architectural Documentation for the Hadoop Distributed File System
نویسندگان
چکیده
Many open source projects are lacking architectural documentation that describes the major pieces of the system, how they are structured, and how they interact. We have produced architectural documentation for the Hadoop Distributed File System (HDFS), a major open source project. This paper describes our process and experiences in developing this documentation. We illustrate the documentation we have produced and how it differs from existing documentation by describing the redundancy mechanisms used in HDFS for reliability.
منابع مشابه
Adaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments
Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...
متن کاملHBase and Hypertable for large scale distributed storage systems A Performance evaluation for Open Source BigTable Implementations
BigTable is a distributed storage system developed at Google for managing structured data and has the capability to scale to a very large size: petabytes of data across thousands of commodity servers. As now, there exist two open-source implementations that closely emulate most of the components of Google’s BigTable i.e. HBase and Hypertable. HBase is written in Java and provides BigTable like ...
متن کاملImplementing Algorithmic Skeletons over Hadoop
In the past few years, there has been a growing interest for storing and processing vast amounts of data that many times exceed the Petabyte limit. To that end, MapReduce, a computational paradigm that was introduced by Google in 2003, has become particularly popular. It provides a simple interface with two functions, map and reduce for developing and implementing scalable parallel applications...
متن کاملAn Efficient Approach to Optimize the Performance of Massive Small Files in Hadoop MapReduce Framework
The most popular open source distributed computing framework called Hadoop was designed by Doug Cutting and his team, which involves thousands of nodes to process and analyze huge amounts of data called Big Data. The major core components of Hadoop are HDFS (Hadoop Distributed File System) and MapReduce. This framework is the most popular and powerful for store, manage and process Big Data appl...
متن کاملCeph as a scalable alternative to the Hadoop Distributed File System
[email protected] THE HADOOP D I S TR I BUTED F I L E System (HDFS) has a single metadata server that sets a hard limit on its maximum size. Ceph, a high-performance distributed file system under development since 2005 and now supported in Linux, bypasses the scaling limits of HDFS. We describe Ceph and its elements and provide instructions for installing a demonstration system that can be used...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011